Novel Organism Verification and Analysis (NOVA) study: identification of 35 clinical isolates representing potentially novel bacterial taxa using a pipeline based on whole genome sequencing

Background Reliable species identification of cultured isolates is essential in clinical bacteriology. We established a new study algorithm named NOVA – Novel Organism Verification and Analysis to systematically analyze bacterial isolates that cannot be characterized by conventional identification procedures MALDI-TOF MS and partial 16 S rRNA gene sequencing using Whole Genome Sequencing (WGS). Results We identified a total of 35 bacterial strains that represent potentially novel species. Corynebacterium sp. (n = 6) and Schaalia sp. (n = 5) were the predominant genera. Two strains each were identified within the genera Anaerococcus, Clostridium, Desulfovibrio, and Peptoniphilus, and one new species was detected within Citrobacter, Dermabacter, Helcococcus, Lancefieldella, Neisseria, Ochrobactrum (Brucella), Paenibacillus, Pantoea, Porphyromonas, Pseudoclavibacter, Pseudomonas, Psychrobacter, Pusillimonas, Rothia, Sneathia, and Tessaracoccus. Twenty-seven of 35 strains were isolated from deep tissue specimens or blood cultures. Seven out of 35 isolated strains identified were clinically relevant. In addition, 26 bacterial strains that could only be identified at the species level using WGS analysis, were mainly organisms that have been identified/classified very recently. Conclusion Our new algorithm proved to be a powerful tool for detection and identification of novel bacterial organisms. Publicly available clinical and genomic data may help to better understand their clinical and ecological role. Our identification of 35 novel strains, 7 of which appear to be clinically relevant, shows the wide range of undescribed pathogens yet to define. Supplementary Information The online version contains supplementary material available at 10.1186/s12866-023-03163-7.


Background
Species identification is the first and crucial step in the workflow of clinical microbiology as it provides essential guidance regarding treatment [1].While the vast majority of pathogens isolated in clinical microbiology laboratories belong to well characterized species, a small number of bacterial isolates may not be reliably identified using conventional identification methods due to lack of sufficient reference data or to the presence of a previously uncharacterized organisms.In cases where the rapid Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF MS) methods do not provide a clear identification, molecular techniques are often used.The establishment of 16 S rRNA gene sequence analysis has provided a simple and rapid method for species identification in such cases, and has led to the reclassification and renaming of numerous bacterial genera and species [2,3].However, in some cases, analysis of the 16 S rRNA gene sequence also fails to distinguish between species.In these cases, whole genome sequencing (WGS) can be used, which offers better resolution at the species level [1,4].
We have established an algorithm to identify and characterize strains which are not identifiable by standard methods, i.e., MALDI-TOF MS and partial 16 S rRNA gene sequence analysis, using WGS in a systematic approach.The aim of the study is to detect and characterize new bacterial organisms isolated from clinical specimens and to reliably detect difficult to identify strains.In this report, we describe 35 isolates that represent novel bacterial species, 7 of which were clinically relevant, as well as 26 strains (22 species) whose identification in the routine laboratory was problematic.We provide genome sequences of these species to expand the public database for taxonomic and epidemiological purposes, and we additionally present detailed clinical information about the patients and an assessment of the clinical relevance of the isolates to gain clinical and ecological knowledge about the novel bacterial species.

Methods
The Novel Organism Verification and Analysis (NOVA) study is a prospective study with the aim of characterizing bacterial isolates that are not identifiable by routine diagnostic methods using WGS and thereby describing potential new species.The study was conducted at the Department of Clinical Bacteriology and Mycology of the University Hospital Basel, a tertiary care hospital in Switzerland, and was initiated in 2014.Here we present phenotypic and molecular data on bacterial isolates as well as clinical information on the patients within a time span from December 2014 to January 2022.Isolates that qualified for the NOVA study were identified using a specific algorithm that was integrated into the routine diagnostic process (Fig. 1).

Description of the NOVA algorithm
Microscopy, aerobic and anaerobic cultures from the various clinical specimens were performed according to standard microbiological procedures including enrichment culture using thioglycolate medium.Anaerobic cultures were incubated and manipulated in an anaerobic workstation (Whitley A 95, Don Whitley Scientific Ltd., Bingley, UK).Species identification of bacterial isolates from routine culture procedures was conducted by MALDI-TOF MS (Bruker Daltonics GmbH, Bremen, Germany) using a simple smear technique with a 1-µl formic acid overlay and cyano-4-hydroxyinnamic acid (CHCA) matrix solution.Measurements were analyzed with the main spectra library Bruker Daltonics database.If no reliable species identification was achieved with MALDI-TOF MS; i.e., score < 2.0, divergent results on the first and second hit, no validly published species, e.g., Corynebacterium lipophilic group F1, or no identification on species level, isolates were subsequently analyzed using partial 16 S rRNA gene PCR and sequence analysis of approximately 800 bp of the first part [5].The resulting sequences were compared to the 16 S rRNA gene sequence nucleotide databases of the National Center for Biotechnology Information (NCBI) network service (https://blast.ncbi.nlm.nih.gov).If seven or more mismatches/gaps (corresponding to ≤ 99.0% nucleotide identity) were identified in the analyzed sequence compared to the closest correctly described bacterial species, the isolates were included into the NOVA study (Fig. 1).A species was considered correctly described if it was designated as validly published in the List of Prokaryotic names with Standing in Nomenclature (LPSN) of the German strain collection database (https://www.bacterio.net)[6].

Evaluation of clinical relevance by infectious diseases specialists
Patient data were retrospectively extracted from medical records, and the microbiological findings were evaluated individually along with the patient's clinical presentation by an infectious disease specialist.Clinical relevance was estimated on the basis of the following criteria: clinical signs and symptoms, presence of concomitant pathogens, pathogenic potential of the genus of the isolate, and clinical plausibility.The impact on patient care in terms of antibiotic use or antibiotic switching was not investigated in our study.

Results
A total of 61 isolates, 41 (67%) Gram positive and 20 (33%) Gram negative strains, were not identifiable using routine methods and were included in the NOVA study within the study period.Thirty-five (57%) organisms were identified to be novel bacterial species and 26 (43%) isolates represented difficult to identify organisms.
The anatomical localization of these 61 clinical samples are indicated in Tables 1 and 2. Predominant specimen was blood culture (n = 9).Detailed microbiological results from the 61 cases including type of specimen, microscopy, cultured isolates, MALDI-TOF MS, and partial 16 S rRNA gene sequencing are listed in Table S1.
Overall, medical history and information on clinical relevance were available from 47/61 cases.In 15/47 of cases, the respective bacterial isolate was considered clinically relevant, and in 21 cases as not clinically relevant.In the remaining 11 cases, clinical relevance was unclear.In 3/15 cases classified as clinically relevant, culture growth was monomicrobial.In 2 of these 3 cases, patients had received antibiotics for > 3 days at the time of sample collection.
The age range of the 47 patients was from 7 to 94, median age 61 years.Thirty (64%) were males and 17 (36%) females.

Discussion
We present an innovative algorithm based on WGS for systematic and reliable identification of bacterial isolates that can not be identified by routine diagnostic methods.Using this algorithm, we collected and analyzed a total of 61 clinical isolates, 35 of them represent potentially novel species and from February 2022 to July 2023 another 21 potentially novel isolates have been collected (not presented in this publication).
The idea of this study arose with the introduction of the WGS technology in our laboratory.Initially, analysis of the genomes was performed in individual timeconsuming procedures.A milestone was the availability of the web-based TYGS platform in 2019, which allows genomic data to be analyzed in a standardized manner to determine the correct taxonomic species or define the organism as a novel taxon based on WGS data [11].Our NOVA tool is now integrated in routine diagnostic procedures and is performed weekly.It represents a relatively  fast and reliable tool to identify difficult to identify bacterial strains and allows to discuss the clinical relevance with our infectious disease specialists in a timely manner.The predominant genus among our 61 NOVA isolates was Corynebacterium with 11 isolates.Five of them were difficult to identify and six represent novel species (Fig. 2).Non-diphtheria corynebacteria are part of the normal microbiota of human skin and mucosa and are therefore very common isolates in clinical samples [17].This may explain our finding, as well as the fact that none of the 11 corynebacteria isolates were considered clinically relevant.However, the growing number of immunocompromised patients and the use of invasive devices are accompanied by an increase in infections with opportunistic pathogens [17,18].For this reason and due to the different antibiotic resistance patterns of the different Corynebacterium sp., the identification of this bacterial group on species level is of great importance [19].For this purpose, in addition to MALDI-TOF-MS analysis, various molecular methods such as PCR-based assays or sequencing of the rpoB and 16 S rRNA gene have been described [17,[20][21][22][23].However, a recent review by Church and colleagues states that approximately 35% of Corynebacterium sp.cannot be distinguished using 16 S rRNA gene sequencing [24].In these cases, sequencing of the rpoB target may provide additional diversity to distinguish some closely related species [21].WGS, with its higher resolution, ultimately offers another means of species identification as well as the advantage of being able to describe the entire genome of a potentially new species.
We assume, that Vandammella animalimorsus represents a novel and emerging pathogen.Our isolate USB_ NOVA_58 originated from a biopsy of a thumb after a dog bite with the clinical diagnosis of septic arthritis and tenosynovitis in 2021.It was identified at that time as a potentially novel organism classified as Corticibacter sp.After reanalysis using the TYGS tool in 2023, the isolate was now identified as V. animalimorsus.This novel genus and species was described by Bernard et al. in 2022 using strains provisionally named "CDC group NO-1" recovered from human wound infections following animal bites [25].Another potential new pathogen is Kingella pumchi.Our strain USB_NOVA_42 was isolated in 2018 from a patient with paronychia and assessed as clinically relevant.At that time it was identified as novel organism tentatively named "unidentified Neisseria sp.".It was described as "Kingella pumchi" in February 2023 by a Chinese group using a strain, that had been isolated from a human vertebral biopsy [26].A novel Cutibacterium, C. modestum, was identified from a prostethic hip fluid.We identified this strain (USB_NOVA_51) in 2020 as "Propionibacterium humerusii", a tentatively named species published in 2011.Some weeks afterwards, C. modestum was described by Dekio I. et al. from an isolate obtained from the meibomian gland [27] showing similar genome data to our strain USB_NOVA_51.We then summarized multiple published data on this organism and showed    [15].The recently described Gulosibacter hominis (4 isolates) and Pseudoclavibacter triregionum (1 isolate) may represent commensals that are part of the human skin microbiome [13,14].
As a strength of our study, we identified and described novel species from clinical samples, while also providing clinical information and evaluating the clinical relevance of the respective bacterial isolate.In approximately one-third (15/47) of all cases where clinical data were available, the bacterial isolate was considered clinically relevant.However, in 12/15 cases, other concomitant pathogens could be identified as possible cause of the infection, so determination of their clinical relevance was difficult.Moreover, we did not evaluate antibiotic efficacy or change in antibiotic administration based on strain identification.This is a limitation of this study because the impact on patient care is difficult to assess without this information.In 11/47 cases the clinical relevance of the isolate was unclear.Six of these 11 isolates belong to novel species.This demonstrates the importance of identifying bacterial species and collecting clinical data on patients to gain insight into the role of these species as a human pathogen and to better assess their clinical significance in the future.
In our findings, 26 of 61 isolates were difficult to identify at the time point of study inclusion, when combining MALDI-TOF MS testing with partial 16 S rRNA gene sequencing.However, the long collection time limits this classification.Technical advances occurring within the timeframe of study inclusion and reporting may or may not allow for identification with one or both of the methods.Yet our NOVA algorithm was implemented to detect novel species which led to 35 of 61 strains being classified as such at timepoint of reporting (August 8 2023).
Overall, the majority (33/61, 54%) of our isolates were Gram positive rods, which are generally difficult to identify biochemically.This is consistent with observations from other laboratories.Church and colleagues found that the largest group of organisms to be sequenced were Gram-positive bacilli, which accounted for 48.5% of all isolates sequenced over a six-year period [24].
The implementation of WGS in clinical microbiology for pan-bacterial identification seems to be more challenging and this method is currently performed mainly at large reference and public health laboratories [28,29].Difficulties arise from the lack of guidelines and standards, as well as financial and technical obstacles [28].Price and colleagues conducted a study using WGS to identify bacteria in a clinical laboratory, evaluated their clinical relevance, and thereby provided a model for validating and implementing WGS in such a setting.They used a diverse set of 125 bacterial isolates, and were able to identify 100% (89/89) and 89% (79/89) of isolates to genus and species levels, respectively.WGS also provided better results for isolates (71% (25/35) originally reported at the genus level or descriptively only.In addition, review of patient records showed that improved identification at the genus or species level through WGS may have had a positive impact on patient care.For example, unnecessary use or use of ineffective antibiotics could be identified, as well as the results provide assistance with outbreak investigations [28,29].These benefits of WGS is weighed against the question of clinical practicability with regard to the long turnaround time (1-2 weeks in Price´s study).Faster WGS methods, such as nanopore sequencing, could help overcome this problem [30].
Current literature indicates that the MALDI-TOF MS method identifies approximately 98% of routine clinical isolates to the genus level and > 90% to the species level, with < 1% misidentified [24,31].This is in line with observations in our laboratory.The vast majority of samples coming through our clinical microbiology laboratory are easily resolvable as species using standard methods.Nevertheless, the identification here of 35 potentially novel taxa, seven of which appear to have had a clinically relevant role, shows that there is still a wide range of undescribed bacterial organisms from clinical samples.Clinical microbiologists and infectious disease specialists should be aware of this spectrum and we encourage other laboratories to apply or to adapt our algorithm to improve the identification of difficult to identify isolates.A next important step within our NOVA study will be the correct taxonomical description of these isolates.

Conclusions
To conclude, we developed an algorithm to characterize strains which are not identifiable by standard methods using WGS that allowed the identification of multiple, potentially novel taxa as well as difficult to identify strains.Public availability of corresponding genome sequences and detailed clinical information may help to expand the clinical and ecological understanding regarding these novel bacterial organisms.

Fig. 1
Fig. 1 Algorithm for identification of clinical isolates suitable for the NOVA study

Table 1
List of 35 clinical isolates representing novel taxa and corresponding clinical data

Table 2
List of 26 clinical isolates which were identified by using WGS and corresponding clinical data that "P.humerusii" and C. modestum represent the same species und that this bacterium often is misidentified as Cutibacterium acnes Abbreviations.ID, identification; y, year; f, female; m, male; SSI, surgical site infection; IVDU, intravenous drug use; N.a.not applicable; ESBL, extended spectrum beta-lactamases